Introduction

According to the World Health Organization, every year, about 800,000 people die due to suicide. In this project, with a joint dataset from United Nations Development Program, World Bank, Kaggle, and World Health Organization, we examined current trend of Suicide Commitmennts. In particular, we are intrested in:

Data

The main dataset for our project is a combined dataset from summary datasets made by United Nations Development Program, World Bank, Kaggle, and World Health Organization. It can be access at here. This dataset has a range from 1985 to 2016. However, since there are very few data in 2016, we will only keep the range from 1985 to 2015. The raw dataset has a size of 27660 observations and 8 features. Basic features we are interested in include:

Besides those, we will derive our main interested variable, Suicides Per 100K as Suicides_no divided by Population and mutiplied by 100,000. The sample of the final dataset is shown below:

year country sex age population gdp_per_capita suicides_no suicide_per_100k
1985 Antigua and Barbuda female 15-24 7709 3850 0 0
1985 Antigua and Barbuda female 25-34 6344 3850 0 0
1985 Antigua and Barbuda female 35-54 6173 3850 0 0
1985 Antigua and Barbuda female 5-14 7339 3850 0 0
1985 Antigua and Barbuda female 55-74 3778 3850 0 0
1985 Antigua and Barbuda female 75+ 949 3850 0 0

Statistical Analysis

Regression analysis will be the main method in our study.

Results

Global Trend of Suicide Per 100k Populationn over time

Before 1995, the suicide rate at the global level is increasing, but since then, it keeps decreasing.

Global Trend of Suicide Per 100k Populationn by gender over time

We found that surprisingly, male has higher rate of suicide than female since 1985. Female suicide rate has a very stable trend throughout the history, while there were dramatic changes for male.

p <- maindata%>%
  group_by(year, sex) %>%
  summarize(suicide_per_100k = (sum(as.numeric(suicides_no)) / sum(as.numeric(population))) * 100000) %>%
  ggplot(aes(x = year, y = suicide_per_100k, col = factor(sex))) + 
  geom_line() + 
  geom_point() + 
  labs(title = "Trends Over Time, by Sex", 
       x = "Year", 
       y = "Suicides per 100k", 
       color = "Sex") + 
  scale_x_continuous(breaks = seq(1985, 2015, 5), minor_breaks = F)

p + transition_reveal(year)

Global Trend of Suicide Per 100k Populationn by age over time

Suicide rates for the youngest age group nearly constant and low over time. As the graph shown, elder groups have had higher suicide rate since 1985, and surprisingly such trend has not changed once.

p <- maindata%>%
  group_by(year, age) %>%
  summarize(suicide_per_100k = (sum(as.numeric(suicides_no)) / sum(as.numeric(population))) * 100000) %>%
  ggplot(aes(x = year, y = suicide_per_100k, col = factor(age))) + 
  geom_line() + 
  geom_point() + 
  labs(title = "Trends Over Time, by Sex", 
       x = "Year", 
       y = "Suicides per 100k", 
       color = "Sex") + 
  scale_x_continuous(breaks = seq(1985, 2015, 5), minor_breaks = F)

p + transition_reveal(year)

Suicide Rate by Country GDP

GDP has been viewed as a good measure about the development of a country. However, graph below shows that there are no obvious trend between GDP and suicide rate. Although GDPs across the world have been shifted toward larger direction, such trend persists.

Countries with most suicides across the years

Inference

From the distribution plot of suicide_100k_pop, we can see that we need to transform it to satisfy the assumptions for linear model. we used log transformation and changed 0’s to 0.01 for further calculations. Following graphs show that after transformation, the distribution has been much more normal than previous

## 
## Call:
## lm(formula = log_suicide ~ year + sex * age + gdp_per_capita, 
##     data = maindata_log_y)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7101 -0.6122  0.1882  0.6922  3.2761 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.041e+01  1.614e+00  12.649  < 2e-16 ***
## year             -9.692e-03  8.074e-04 -12.005  < 2e-16 ***
## sexmale           1.097e+00  3.133e-02  35.021  < 2e-16 ***
## age25-34          8.270e-02  3.133e-02   2.640 0.008300 ** 
## age35-54          2.699e-01  3.133e-02   8.616  < 2e-16 ***
## age5-14          -1.625e+00  3.133e-02 -51.873  < 2e-16 ***
## age55-74          3.531e-01  3.133e-02  11.272  < 2e-16 ***
## age75+            4.404e-01  3.133e-02  14.057  < 2e-16 ***
## gdp_per_capita    6.678e-06  3.604e-07  18.530  < 2e-16 ***
## sexmale:age25-34  2.767e-01  4.431e-02   6.246 4.26e-10 ***
## sexmale:age35-54  2.562e-01  4.431e-02   5.783 7.42e-09 ***
## sexmale:age5-14  -8.449e-01  4.431e-02 -19.070  < 2e-16 ***
## sexmale:age55-74  1.647e-01  4.431e-02   3.718 0.000201 ***
## sexmale:age75+    2.615e-01  4.431e-02   5.903 3.60e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.064 on 27646 degrees of freedom
## Multiple R-squared:  0.5109, Adjusted R-squared:  0.5107 
## F-statistic:  2221 on 13 and 27646 DF,  p-value: < 2.2e-16